Search CORE

16 research outputs found

Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach

Author: A Airola
A Culotta
AP Manine
AR Aronson
BJ Grosz
C Jacquemin
C Nédellec
D Bollegala
D Field
D Zelenko
G Erkan
I Segura-Bedmar
JD Kim
JO Korbel
K Fundel
K Liolios
M Torii
N Kambhatla
Pierre Warnier
R Bossy
R Bossy
S Aubin
S Lappin
SA Kripke
SP Lapage
T Hamon
T Ono
Wiktoria Golik
Y Lin
Z GuoDong
Zorana Ratkovic
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

International audienceBackground: Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field

Crossref

Springer - Publisher Connector

PubMed Central

HAL Descartes

Hal-Diderot

Deep analysis in IQA: evaluation on real users dialogues.

Author: Ratkovic Zorana
Publication venue
Publication date: 01/01/2009
Field of study

Interactive Question Answering (IQA) is a natural and cohesive way for a user to obtain information by interactive with a system using natural language. With the advancement in Natural Language Processing, research in the eld of IQA has started to focus on the role of semantics and the discourse structure in these systems. The need for a deeper analysis, which examines the syntax and semantics of the questions and the answers is evident. Using this deeper analysis allows us to model the context of the interaction. I will look at a current closeddomain IQA system which is based on Linear Regression modeling. This system uses super cial and non-semantically motivated features. I propose adding deep analysis and semantic features in order to improve the system and show the need for such analysis. Particular attention will be placed on the so-called follow-up questions (questions that the user poses after having received some answer from the system) and the role of context. I propose that adding the linguistically heavy features will prove bene cial, thereby showing the need for such analysis in IQA systems

National Repository of Grey Literature

Analyse prédicative pour l'extraction d'information : application au domaine de la biologie

Author: Ratkovic Zorana
Publication venue
Publication date: 11/12/2014
Field of study

L’abondance de textes dans le domaine biomédical nécessite le recours à des méthodes de traitement automatique pour améliorer la recherche d’informations précises. L’extraction d’information (EI) vise précisément à extraire de l’information pertinente à partir de données non-structurées. Une grande partie des méthodes dans ce domaine se concentre sur les approches d’apprentissage automatique, en ayant recours à des traitements linguistiques profonds. L’analyse syntaxique joue notamment un rôle important, en fournissant une analyse précise des relations entre les éléments de la phrase.Cette thèse étudie le rôle de l’analyse syntaxique en dépendances dans le cadre d’applications d’EI dans le domaine biomédical. Elle comprend l’évaluation de différents analyseurs ainsi qu’une analyse détaillée des erreurs. Une fois l’analyseur le plus adapté sélectionné, les différentes étapes de traitement linguistique pour atteindre une EI de haute qualité, fondée sur la syntaxe, sont abordés : ces traitements incluent des étapes de pré-traitement (segmentation en mots) et des traitements linguistiques de plus haut niveau (lié à la sémantique et à l’analyse de la coréférence). Cette thèse explore également la manière dont les différents niveaux de traitement linguistique peuvent être représentés puis exploités par l’algorithme d’apprentissage. Enfin, partant du constat que le domaine biomédical est en fait extrêmement diversifié, cette thèse explore l’adaptation des techniques à différents sous-domaines, en utilisant des connaissances et des ressources déjà existantes. Les méthodes et les approches décrites sont explorées en utilisant deux corpus biomédicaux différents, montrant comment les résultats d’IE sont utilisés dans des tâches concrètes.The abundance of biomedical information expressed in natural language has resulted in the need for methods to process this information automatically. In the field of Natural Language Processing (NLP), Information Extraction (IE) focuses on the extraction of relevant information from unstructured data in natural language. A great deal of IE methods today focus on Machine Learning (ML) approaches that rely on deep linguistic processing in order to capture the complex information contained in biomedical texts. In particular, syntactic analysis and parsing have played an important role in IE, by helping capture how words in a sentence are related. This thesis examines how dependency parsing can be used to facilitate IE. It focuses on a task-based approach to dependency parsing evaluation and parser selection, including a detailed error analysis. In order to achieve a high quality of syntax-based IE, different stages of linguistic processing are addressed, including both pre-processing steps (such as tokenization) and the use of complementary linguistic processing (such as the use of semantics and coreference analysis). This thesis also explores how the different levels of linguistics processing can be represented for use within an ML-based IE algorithm, and how the interface between these two is of great importance. Finally, biomedical data is very heterogeneous, encompassing different subdomains and genres. This thesis explores how subdomain-adaptationcan be achieved by using already existing subdomain knowledge and resources. The methods and approaches described are explored using two different biomedical corpora, demonstrating how the IE results are used in real-life tasks

HAL Descartes

Theses.fr

Deep analysis in IQA: evaluation on real users dialogues.

Author: Ratkovic Zorana
Publication venue
Publication date: 01/01/2009
Field of study

CU Digital Repository

National Repository of Grey Literature

Analyse prédicative pour l’extraction d’information : application au domaine de la biologie

Author: Ratkovic Zorana
Publication venue: HAL CCSD
Publication date: 11/12/2014
Field of study

The abundance of biomedical information expressed in natural language has resulted in the need for methods to process this information automatically. In the field of Natural Language Processing (NLP), Information Extraction (IE) focuses on the extraction of relevant information from unstructured data in natural language. A great deal of IE methods today focus on Machine Learning (ML) approaches that rely on deep linguistic processing in order to capture the complex information contained in biomedical texts. In particular, syntactic analysis and parsing have played an important role in IE, by helping capture how words in a sentence are related. This thesis examines how dependency parsing can be used to facilitate IE. It focuses on a task-based approach to dependency parsing evaluation and parser selection, including a detailed error analysis. In order to achieve a high quality of syntax-based IE, different stages of linguistic processing are addressed, including both pre-processing steps (such as tokenization) and the use of complementary linguistic processing (such as the use of semantics and coreference analysis). This thesis also explores how the different levels of linguistics processing can be represented for use within an ML-based IE algorithm, and how the interface between these two is of great importance. Finally, biomedical data is very heterogeneous, encompassing different subdomains and genres. This thesis explores how subdomain adaptation can be achieved by using already existing subdomain knowledge and resources. The methods and approaches described are explored using two different biomedical corpora, demonstrating how the IE results are used in real-life tasks.La thèse s'inscrit dans le contexte décrit précédemment : il s'agit d'explorer des techniques d'acquisition de connaissances lexicales à partir de textes, à des fins tant théoriques qu'applicatives. l'analyse portera plus particulièrement sur le prédicat verbal et ses nominalisations car celui-ci joue un rôle essentiel pour les applications de tal (repérage d'événements, extraction d'information, etc.). on s'intéressera par exemple à l'acquisition de cadres de sous-catégorisation et de restrictions de sélections afin de déterminer des familles de verbes ayant un comportement syntaxico-sémantique proche. la stratégie envisagée est fortement inspirée des travaux de z. harris et de ses collègues (harris 1951, 1988 ; harris et al., 1989). celui-ci a montré que les textes techniques n'utilisent pas toute la complexité de la langue mais font au contraire usage de « sous-langages ». un sous-langage a un vocabulaire spécialisé et une syntaxe simplifiée par rapport à la langue courante. les textes de spécialités font donc apparaître des régularités qui peuvent s'analyser par analyse distributionnelle (en simplifiant : les éléments apparaissant dans des contextes similaires ont des sens similaires, ou tout au moins proches). seulement, l'analyse distributionnelle en peut fonctionner que si le texte a été « nettoyé » des variations linguistiques de surface. une pré-analyse des textes est donc cruciale

HAL Descartes

Improving term extraction with linguistic analysis in the biomedical domain

Author: Bossy Robert,
Golik Wiktoria
Nédellec Claire
Ratkovic Zorana
Publication venue: National Polytechnic Institute
Publication date: 01/01/2013
Field of study

à ce jour 31/01/2014 cette parution est en " draft version "International audienceThis paper presents a linguistic-based approach to term extraction in the biomedical domain. The method is based on a linguistic analysis of constraints on terms and their context, focusing on participles and prepositional complements. The purpose of our approach is to obtain terms that are relevant for knowledge acquisition applications, such as the creation and enrichment of terminologies and ontologies. We report on the evaluations conducted following two complementary strategies, using a reference terminology and a manual validation. They were applied to two corpora of differing genre and domain, namely pharmacology patents and animal physiology scientific articles. Our work shows that the linguistic analysis-based developments significantly improve extraction results. The method is especially efficient when dealing with gerunds and "to" prepositional modifier

HAL Descartes

Improving term extraction with linguistic analysis in the biomedical domain

Author: Bossy Robert,
Golik Wiktoria
Nédellec Claire
Ratkovic Zorana
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 24/03/2013
Field of study

International audienceThis paper presents a linguistic-based approach to term extraction in the biomedical domain. The method is based on a linguistic analysis of constraints on terms and their context, focusing on participles and prepositional complements. The purpose of our approach is to obtain terms that are relevant for knowledge acquisition applications, such as the creation and enrichment of terminologies and ontologies. We report on the evaluations conducted following two complementary strategies, using a reference terminology and a manual validation. They were applied to two corpora of differing genre and domain, namely pharmacology patents and animal physiology scientific articles. Our work shows that the linguistic analysis-based developments significantly improve extraction results. The method is especially efficient when dealing with gerunds and "to" prepositional modifier

HAL Descartes

Improving term extraction with linguistic analysis in the biomedical domain

Author: Bossy Robert
Golik Wiktoria
Nédellec Claire
Ratkovic Zorana
Publication venue: Springer
Publication date: 01/01/2013
Field of study

This paper presents a linguistic-based approach to term extraction in the biomedical domain. The method is based on a linguistic analysis of constraints on terms and their context, focusing on participles and prepositional complements. The purpose of our approach is to obtain terms that are relevant for knowledge acquisition applications, such as the creation and enrichment of terminologies and ontologies. We report on the evaluations conducted following two complementary strategies, using a reference terminology and a manual validation. They were applied to two corpora of differing genre and domain, namely pharmacology patents and animal physiology scientific articles. Our work shows that the linguistic analysis-based developments significantly improve extraction results. The method is especially efficient when dealing with gerunds and "to" prepositional modifier

HAL Descartes

ProdInra